Regulation of early diagnosis and prognostic markers of lung adenocarcinoma in immunity and hypoxia

Lung adenocarcinoma is still cancer with the highest mortality. Hypoxia and immunity play an essential role in the occurrence and development of tumors. Therefore, this study is mainly to find new early diagnosis and prognosis markers and explore the relationship among the markers and immunity and hypoxia, to improve the prognosis of patients. Firstly, based on the clinical database in TCGA, we determined the most critical clinicopathological parameters affecting the prognosis of patients through a variety of analysis methods. According to pathological parameters, logistic most minor absolute contraction selection operator (lasso), univariate and multivariate regression analysis, the risk genes related to early prognosis were screened, and the risk model was established. Then, in different risk groups, GSEA and CIBERSORT algorithms were used to analyze the distribution and enrichment of the immune cells and hypoxia, to study the effects of early prognostic indicators on hypoxia and immunity. At the same time, we analyzed the different levels of risk genes in normal cells (BSEA-2B) and tumor cells (H1299, A549, PC9, and H1975). Finally, A549 and PC9 cells were induced by CoCl2 to establish a hypoxic environment, and the correlation between risk genes and HIF1A was analyzed. The risk model based on risk genes (CYP4B1, KRT6A, and FAM83A) was accurate and stable for the prognosis of patients. It is closely related to immunity and hypoxia. In BSEA-2B cells, the mRNA and protein expression of CYP4B1 was higher, while the expression of KRT6A and FAM83A was lower. Finally, we found that FAM83A and HIF1A showed a significant positive correlation when A549 and PC9 cells were exposed to hypoxia. The discovery of early diagnostic markers related to immunity, hypoxia, and prognosis, provides a new idea for early screening and prognostic treatment of lung adenocarcinoma.

Lung cancer is a disease with the highest mortality among all cancers, and non-small cell lung cancer accounts for about 85% of lung cancer 1 . Lung adenocarcinoma is the most common histological category of non-small cell lung cancer; It is rising among younger men and women 2,3 . Although surgery and combination therapy have been improving in recent years, the five-year survival rate of patients is still less than 20% [4][5][6][7][8] . Through a variety of modern methods, the prognosis of patients can be diagnosed and analyzed. Therefore, the TMN Stage has a higher five-year survival rate than other Stages 9 . Therefore, finding new early molecular prognostic markers can quickly and accurately determine the Stage of TMN, which is a very profitable strategy for patient treatment. Identifying the PD-1/PD-L1 immune checkpoint pathway as a therapeutic target for inducing an immune response has shown excellent prospects to respond to tumor cells 10 . However, this treatment is effective only for some (20-30%) patients [11][12][13][14] . At the same time, in most solid tumors, hypoxia is a crucial factor to promote the survival and adaptation of tumor cells and help cancer cells progress 15 . Therefore, Therefore, the development of new early prognostic markers related to immunity and hypoxia may improve the prognosis of patients with lung adenocarcinoma.
In this study, early diagnostic markers were screened using the gene transcriptome of the regular group and TMN Stage and the protein database of CPTAC, respectively. Then, risk genes were filtered from the early Simulated cell hypoxia environment. The cells were planted overnight in a six well plate with a cell density of about 60-70%. The next day, 200 µ mol CoCl2 was added to each well. After 24-36 h, the protein was extracted and WB experiment was carried out. In the cell culture system, cobalt chloride (CoCl2) is a substance that induces cell hypoxia. CoCl2 inhibits the catalysis of proline hydroxylase so that the cells are in a state similar to hypoxia 18-20 . Western blotting and RT qPCR experiment. The cells were cultured for 24 h, 1 ml Trizol was added, fully mixed and dissolved, operated according to the kit, and analyzed by RT-PCR. The cells were treated with complete protease inhibitor and Ripa lysate to extract protein. The protein concentration was obtained through the protein quantitative kit. Then, the electrophoresis instrument was modulated and stabilized at 70 V and 110 V for 90 and 60 min respectively, and the target protein was separated. Then, the steady flow 220A was modulated and the membrane was turned for 120 min. Then, the whole membrane was sealed with rapid sealing solution for 30 min. The ratio of working solution of primary antibody and secondary antibody was 1:1000 and 1:3000 respectively. Similarly, The incubation time of the target PVDF membrane is 12 h at 4 °C, and 2H at room temperature respectively. In the middle, the membrane needs to be washed three times for 10 min each time.
Due to the small scope of cutting the PVDF membrane, the target antibody may be located at the boundary, but it has little effect on the overall result. www.nature.com/scientificreports/ Statistical analysis. The "survivalroc" package of R (version 4.1.2) was used to analyze the ROC curve, and the "survivminer" and "glmnet" were combined for univariate, multivariate, lasso cox regression and survival difference analysis. The "RMS" package was used to draw the nomogram and calibration diagram, and then the "limma" package was used to analyze the difference of gene expression data. Other R packages that draw graphs of biological information related differences and correlations, including "ggpubr", "pheatmap", "ggplot2", "ggpubr", "ggextra" and "corrplot" packages in R software. Graphpad prism 8 is used to analyze and draw WB and PCR results.
Ethics approval. The data used in this study were obtained from publicly available datasets, such as the GEO database (HTTPS:// www. NCBI. nlm. nih. gov/ geo), and The Cancer Genome Atlas (HTTPS:// portal. GDC. cancer. gov). The KEGG and go pathway analysis used was from David database (https:// david. ncifc rf. gov/ summa ry. jsp).

Result
Determine the most significant independent prognostic clinical factors. In univariate and multivariate analysis, we found that only the p values of TMN Stage were all less than 0.05 (Fig. 1A 1C), and the AUC value of TMN Stage was the highest. They are divided into two groups according to the TMN StageI and TMN StageII-IV. The survival rates of the two groups were significantly different, the prognosis of TMN StageII-IV was worse than that of other groups (p < 0.001) (Fig. 1D). www.nature.com/scientificreports/ Screening early diagnostic genes. Based on the normal and Stage groups in TCGA data, we used the Wilcoxon test method in R language to screen 7333 significant genes (Fig. 1E,F), and then differentially expressed between StageI and StageII-IV groups to obtain 296 significant genes (Fig. 1H). We also used the limma package in R language to analyze the differences between normal and tumor groups in the CPTAC database, To obtain 2234 significant genes (Fig. 1G). Finally, 38 early diagnosis genes (edgs) were obtained by the intersection of the above three groups of significant genes (Fig. 1I).
Screening early diagnosis and prognosis genes and establishing an early prognosis model. Firstly, 24 prognostic related edgs (edgps) ( Fig. 2A) were screened by univariate Cox regression analysis, then, three early diagnostic markers (risk genes) with the greatest impact on prognosis edgps ( Fig. 2B-D) were screened and calculated as risk scores based on lasso Cox regression dimensionality reduction and multivariate Cox regression analysis, to establish early prognostic risk model (edgpsl) 16 . The calculation formula of risk score is as follows: risk score = (−0.09 × Expression CYP4B1) + (0.1238 × Expression value of KRT6A) + (0.1022 × Expression value of FAM83A). Through the "survival" package in R language, the best threshold of clinical prognosis is selected from the risk score, which is used as the basis for the high-risk and lowrisk groups. Through KM survival analysis, the study found that the prognosis of high-risk groups is very low, compared with the low-risk group in TCGA (  www.nature.com/scientificreports/ Correlation between risk score and clinical features. We found that the risk scores were mainly distributed in the T2-4 Stage (Fig. 3D), N1-3 Stage (Fig. 3F), and TMN StageII-IV (Fig. 3C). However, there were no significant differences in the distribution of risk scores among age ( Then, by analyzing the difference of immune cell composition between high and low-risk groups (Fig. 4A), we found that NK cell resting Macrophage M1, Neutrophil, Mast cell resting, macrophage M0 and T cell CD4 + memory activated were mainly distributed in the high-risk group, while the immune cells mainly distributed in the low-risk group were mast cell activated, T cell CD4 + memory resting, myeloid dendritic cell resting, B cell memory, and Monocyte. Finally, compared with the low-risk group, the expression of immune-related genes programmed death-ligand 1 (PD-L1) and programmed cell death protein 1(PD1) were higher in the high-risk group (Fig. 4C). In the low-risk group, the expression levels of interleukin-4 (IL-4) and surface molecule CD20 (CD20) were higher than those in the high-risk group (Fig. 4D).
Hypoxia effects in the risk model. Based on the HALLMARK data set in the GSEA tool, we analyzed the functional enrichment of high and low-risk groups under the risk model. The high-risk group was mainly enriched in hypoxia, glycolysis, and PI3K-Akt-mTOR signal pathway ( Fig. 5A-C). The mutation load of the high-risk group was higher than that of the low-risk group (Fig. 5D). At the same time, hypoxia-inducible factor A (HIF1A) and lactate dehydrogenase A (LDHA) were higher in the high-risk group than in other groups (Fig. 5E). In the high HIF1A group, the expression levels of FAM83a and KRT6A were higher, while CYP4B1 was the opposite (Fig. 5F). In addition, FAM83A and KRT6A were positively correlated with HIF1A, while CYP4B1 was contrary (Fig. 5G).

Establish and verify nomogram based on TMN Stage and risk score. Univariate and multivariate
Cox regression analysis of multiple clinical factors and risk scores showed that TMN Stage and risk score had a significant impact on the prognosis of patients.  6B). AUC values of various clinical factors and risk scores were analyzed in the "TIMER ROC" package in R language, among which TMN Stage (AUC = 0.669) and risk score (AUC = 0.711) is     (Tables 1, 2, 3). Compared with normal and StageI groups, KRT6A and FAM83A were higher in StageII-IV groups, while CYP4B1 expression levels were opposite (Fig. 8A-C). Similarly, in the analysis of the CPTAC dataset in the UALCAN database, the protein expression levels of KRT6A and FAM83A were higher in the tumor group, while CYP4B1 was the opposite (Fig. 8D-F). The mRNA and protein expression levels of FAM83A and KRT6A in tumor cells A549, PC9, and H1975 were higher than those of normal cells BSE-2B, while the mRNA and protein expression level of CYP4B1 in normal cells BSE-2B was higher than that of tumor cells H1299, PC9, A549 and H1975 (Fig. 9A-E) (*p < 0.05, **p < 0.01, ***p < 0.001).
Correlation between risk genes and HIF1A in a hypoxic environment. HIF1A increased significantly after CoCl2 induced A549 and PC9 cells. Meanwhile, in A549 cells, KRT6A and FAM83A increased with HIF1A. In PC9 cells, only FAM83A increased with HIF1A. The changing trend of CYP4B1 is not apparent (Fig. 10 A-B).

Discussion
This study evaluated the prognosis and accuracy of various clinicopathological parameters through the univariate, multivariate Cox regression and ROC curve analysis. In different TMN Stage groups, there was a significant survival difference between Stage I and Stage II-IV groups; TMN Stage has the most noticeable impact on the prognosis of patients. In the past study, we also found that TMN Stage can predict the survival probability of patients 9 . At the same time, LUAD, as a fatal malignant tumor, can lead to the five-year survival rate of patients being still very low 21 . In conclusion, we can improve the prognosis of patients by grouping TMN Stage to obtain new potential early diagnostic and prognostic markers. Thirty-eight early diagnosis genes (edgs) were selected    www.nature.com/scientificreports/ from three groups of genes with significant differences, including StageI v Normal, StageI v II-IV, and CPTAC. Then, based on the univariate, lasso, and multivariate Cox regression analysis, three risk genes were selected from edgps and calculated to establish a risk model. The three risk genes were CYP4B1, FAM83A, and KRT6A, respectively. As an extrahepatic form of cytochrome P450, CYP4B1 is mainly highly expressed in the lung and a small amount in other organs 22 . Decreased CYP4B1 is associated with poor prognosis in patients with bladder cancer 23 . Inhibit CYP4B1 can promote the occurrence of lung adenocarcinoma by preventing metabolism, enhancing DNA replication, and cell cycle activity; when the specific situation is unknown, it needs to be verified by experiments 24 . FAM83A is a member of an 8-member protein family. They have a highly conserved N-terminal domain with the same function and unknown function called the duf1669 domain 25 . FAM83A has been proved to promote the proliferation, invasion, stem cell-like characteristics, and drug resistance of lung cancer, breast cancer, and pancreatic cancer [26][27][28][29][30][31] . Keratin 6A (KRT6A) is a type II keratin involved in the epimerization of squamous epithelium 32,33 . Recent studies found that KRT6A plays a vital role in cell migration, especially keratinocyte migration. Downregulation of KRT6A expression can inhibit cell invasion and metastasis of nasopharyngeal carcinoma. In lung adenocarcinoma, a high KRT6A level is associated with poor prognosis and can promote the growth and metastasis of lung adenocarcinoma by inducing epithelial-mesenchymal transformation [34][35][36][37][38] . Interestingly, using OS, DSS, DFI, and PFI related univariate Cox regression analysis in various tumors, we found that FAM83A, CYP4B1, and KRT6A had the most significant impact on the survival and progression of lung adenocarcinoma. In the normal tissue and TMN Stage group, FAM83A and KRT6A were the highest in the TMN Stage II-IV group, while CYP4B1 was the opposite. In normal and tumor cells, FAM83A and KRT6A were expressed higher in most tumor cells than in normal cells, while CYP4B1 was the opposite. In conclusion, the three risk genes significantly impact the early diagnosis and prognosis of patients with lung adenocarcinoma. The patients were divided into high and low-risk groups, according to the risk score in the early prognosis model. www.nature.com/scientificreports/ In the correlation analysis of multiple clinicopathological indexes and risk scores, we found that the risk scores in T2-4, N1-3, and StageII-IV groups were generally higher than those in other groups. Then, through univariate and multivariate cox regression analysis, we found that TMN Stage and risk score was the more significant independent prognostic factors. The two independent prognostic factors were used to establish a nomogram with good prediction ability and accuracy. Using the immune score analyzed by the CIBERSORT method on the TIMER website, we found that NK cell resting, macrophage M1, neutrophil, mass cell resting, and macrophage M0 cells were distributed higher in a high-risk group, and the risk score was positively correlated with these immune cells. Mass cell activated, T cell CD4 + memory resting, myeloid dendritic cell resting, B cell memory, and monocyte cells were highly distributed in the low-risk group and negatively correlated with a risk score. Through David's online website's GO function enrichment analysis, the essential immune-related pathway in the low-risk group is a human antimicrobial response; B cell plays a vital role in humoral immunity 39,40 . Interleukin 4 (IL4) and B cell surface receptor CD20, which promote the proliferation and development of B cells, were highly expressed in the risk group [41][42][43][44][45][46] .
Based on the hallmark data set analysis in the GSEA tool, the high-risk group was mainly enriched in hypoxia, PI3K-Akt -mTOR, and the glycolysis signal pathway. At the same time, the HIF1A, LDHA, and mutation load are higher in the high-risk group than in the risk group. In previous studies, tumors in a hypoxic environment are more likely to lead to poor prognosis and mutations 47,48 . Under hypoxic conditions, HIF1A expression increases and induces downstream signal pathways, and PI3K -Akt-mTOR signaling pathway can promote tumor proliferation [49][50][51][52][53] . Glycolysis can enhance tumor proliferation, and it is a crucial enzyme to promote glycolytic activity [54][55][56] . Interestingly, HIF1A can also regulate glycolysis through the PI3K-Akt-mTOR signaling pathway 57 . Through the analysis of KEGG data set in the GSEA tool, the high-risk group is mainly enriched in the signal pathways related to cancer, such as boulder cage, cell cycle, DNA replication, glycolysis gluconeogenesis, pancreatic cage, renal cell carcinoma, and small cell lung. The low expression group was mainly enriched in alpha-linolenic acid metabolism, linoleic acid metabolism, and arachidonic acid metabolism pathway. In related learning, we found that alpha-linolenic acid and arachidonic acid metabolism can inhibit tumors and promote apoptosis [58][59][60][61] . Finally, through RT-PCR and WB, we found that the expression levels of tumor cells (A549, H1975, and A549) in BESA-2B cells were significantly higher than those in FAM83A and KRT6A, while CYP4B1 was the opposite. In addition, FAM83A and HIF1A are positively correlated, after CoCl2 induced hypoxia. This discovery has opened up new ideas for us.
In conclusion, based on various bioinformatics methods and cytological experiments, the screened risk genes can be used as potential early prognostic diagnostic markers of lung adenocarcinoma, which is also closely related to the development of immunity and hypoxia. This series of findings will provide a new idea for the comprehensive treatment of LUAD.

Data availability
Publicly available datasets were analyzed in this study; these can be found in the GEO database (https:// www. NCBI. nlm. nih. gov/ geo), and The Cancer Genome Atlas (https:// portal. GDC. cancer. gov). The authors confirm that the data supporting the findings of this study are available within the article and its Supplementary materials. A statement that all methods and public data are implemented by relevant guidelines and regulations. www.nature.com/scientificreports/